rtftohtml Change History
The latest version of this document and other information about
RTFtoHTML is kept at
http://www.sunpack.com/RTF.
You may contact the author: Chris Hector at
chris@sunpack.com
- Added HEIGHT and WIDTH tags to regular inline images. This will improve
the performance for most browsers, since they do not need to download the
images prior to rendering the entire page. It also improves the rendering,
since images that were re-sized in the RTF document, will now be displayed in
HTML at the correct size. Note that cropped images may not be properly cropped,
since that can only be done by the graphics filtering program. Also, images
that are not imbedded in the RTF file (linked graphics) do not generate HEIGHT
and WIDTH tags.
- Added the ability to process multiple files on UNIX and under DOS. This
is done by adding additional options/filenames on the end of the command.
- RTF files containing links to images will now have <IMG> (inline
images) generated instead of <A HREF...> (links).
- The -h0 (split level) option is now automatically set if any of the
following are specified. If a different split level is specified by the user,
it takes precedence.
- -c (Table of Contents),
-t (Refs on top)
-F (Frames)
-N (Custom
Navigation Panel)
-x (Index)
- Made -F (Frames) imply -c (Table of Contents)
- Changed numeric suffixes for output file names so that they sort in
numerical order.
- Fixed a bug in Mac version where -G was not being added
- If the output filename is specified with a .htm extension, that extension
is now used for all HTML filenames.
- Fixed unix makefile to build rtftohtml.bin and shell to invoke it
- modified htmlunix.c and fileops.c to allow passing of LIBDIR as non-string
- Added bugfix to correct NULL font name problem
(ewiles@mclean.sterling.com)
- Corrected a bug in footnotes which caused footnote reference numbers to
be off by one. It was a increment variable side effect that caused the problem,
so it only appeared in some compilers!
- Fixed a bug in XformStr where global substitions were not correctly
handled (move to end of string was bad value). This showed up when links to
pictures were used.
- Corrected a bug that caused visible index entries to be discarded (even
though the index was correctly generated.)
- fixed regexp prototypes in Unix source distribution
- Added nav-panl and graphic files to Unix distributions
- changed standardcharnames.h to stdcharn.h (in d and e)
- Default GIF files used in the navigation panel now have 8 character (or
less) file names to allow them to be used and properly distributed on DOS/WIN
3.1 platforms.
- Modified the file names to eight characters for translation
files:
html-trans becomes html-trn and nav-panel becomes nav-panl
- The filter now supports RTF version 1.4 (Microsoft Word version 7) files.
- Lists are now supported within table cells.
- Mac 68k version had intermittant memory errors resulting in freezes and
system crashes. This has been corrected.
- Mac version progress bar was not displayed until the second pass through
the file. This was only noticable on very large files, but now both passes are
represented.
- Modified table support to allow images
- Modified timing to improve performance on the Macintosh and give more
cycles to the filter when running in the background.
- Added support for HTML 3.0 tables
- RTFtoWEB support encorporated into the main filter. All platforms now have
identical functionality. The
RTFtoWEB
code was written by Christian Bolik , and it includes many features:
- A single RTF document can be split into a collection of HTML documents
that are linked together by a navigation panels, and a common Table of Contents
and Index.
- The Table of Contents and Index are generated automatically and contain
hyptertext links to the appropriate sections within the document.
- fully customizable navigation panels on top and bottom of each page
- active Cross references to headings, mail addresses and URLs
- support for a few of Netscape's HTML extension, such as the
<center> tag and background images
- Netscape 2.0 Frames are supported to allow the Table of Contents to appear
as a separate frame.
- Short file names can be generated for DOS and Win 3.1 environments.
- PowerPC native support for Macintosh
- Added support for subscript and superscripts
- Error messages now display text prior to where the error occurred so that
you can find the location of the problem in your input source.
- Changed footnotes processing so that a separate file is created only when
file splitting is enabled. Footnotes are now always imbedded in the current
file at the end. They also have a "back" feature that sends you back to the
origination when you click on the footnote.
- Documents that link to graphic files (instead of imbedded graphics) are
now supported.
- Character translation has been improved by providing a complete mapping
for ansi and Macintosh character sets.
- Corrected bug that caused text to be truncated on 80 column boundaries.
- Corrected bug that caused hard-return (\line) to be lost
- Removed the error which caused the following messages :
- ReadStyleSheet: unknown token "\widctlpar"
- Unknown symbol... near line .. position ....
- The filter now correctly dealt with documents that contain revision marks.
- Footnotes where the footnote reference was formatted differently in the
footnote, than it was in the body of the document - are now processed correctly.
- Corrected many memory overruns that caused errors and crashes.
- Long lines of text (>80)with no spaces are now properly handled.
- Table of contents now generates properly nested anchors in all cases.
Among the new features are:
- rtftoweb now creates clean HTML
- the layout of the navigation panel now is fully customizable
- Netscape users can specify additional layout features
- the <center> tag of Netscape is supported
- empty headings are ignored
- if email addresses and URLs are colored red in the RTF file, an HTML
hyperlink will automatically be generated
- some bug fixes
- New command line option: -s, tells rtftoweb to use short, numbered file
names if the HTML output is to be split.
- rtftoweb now checks if a heading is longer then 256 characters and ignores
it if this is the case. A warning is printed, too, since the author most
certainly didn't want such a long heading in his/her document.
- Auto-numbered headings are now fully supported.
- Internal cross references (created by text formatted as "_Name" and
"_Href") are now correctly resolved across file boundaries.
- FlushHTML() rewritten so that no newline is written to the output file
when this proc is invoked.
Changed a two things in html-trn to work better with WinWord:
- Added "Courier New" to the list of monospaced fonts and added a line for
hidden and strokethrough text to generate literal HTML.
- If the file "nav-panl" exists in rtftohtml's library directory (that is
where html-map etc. live), it is automatically read if no -N option was given
to rtftohtml.
- Email addresses and URIs in the RTF text are now also recognized as HTML
references if they are colored red (as is usual for cross references inside the
document).
- Fixed bug in htmlout.c (nextfield) which caused the 81st character of
non-wrappable lines to be lost.
- Work-around for a bug introduced by Winword 6.0: It pretends that the
character set used in an RTF file written by it is always ANSI, although, when
the source of the RTF-file was a Mac-Word-document, the character set is still
MAC. The shell-script rtfcharset can be used to fix this, e.g. by saying
"rtfcharset foo.rtf mac".
- The navigation panel now is completely configurable! This is accomplished
by using a simple configuration file which specifies the panel elements and
thus allows for navigation buttons to be images. The conf-file is given to
rtftohtml via the -N option.
- Some more Netscapisms: The nav-panl config-file can also be used to
specify background images and colors, text color and the appearane of
horizontal rules.
* Empty headlines are ignored now.
* Corrected HTML of footnote and index files.
* Added support for Netscape's <center> tag.
- html-map: is not supported by any HTML-browser I know, mapped
nobrkspace to " "
- text of optional RTF-destinations in headings is now ignored
- removed interpretation of bookmarks as being anchors (this was not working
right)
- added support for castilian spanish (thanks to Pedro Bayon)
- fixed a bug that caused image files to be overwritten (thanks to Toni
Verdu)
- changed <a><h?>...</h?></a> to
<h?><a>...</a></h?>
- The rtftoweb patches now generate clean HTML!
- Anchors of index anchors don't contain any text anymore, since Mosaic and
the Mosaic-based Chimera are the only browsers which (falsely) require
non-empty anchors. An anchor text may be specified using the command line
option "-X <text>". <text> could be for example · as
suggested by Alan Flavell.
* removed bug in hierHeadingLevel() that caused frequent segmentation
faults
* added support for french and italian language, thanks to
achille@aisws5.cern.ch
* works with rtftohtml-2.7.5 (without a change)
* some minor changes concerning file names and array allocation
* FINAL VERSION OF rtftoweb! (see above for details)
- rtftoweb now uses html-trn to identify a heading style as such. (Until now
heading styles had to be named "heading 1" etc. Thanks Andrew.)
- This version of rtftoweb *requires* rtftohtml 2.7.4 (tired of keeping
things compatible with older versions of rtftohtml).
- made rtftoweb avoid '/' and '+' in filenames
- check for RTFGetStyle() == NULL (Thanks Justin!)
- removed bug in hierIntList
- several minor changes
- added -T cmdline option to specify the document title
- updated to work with rtftohtml-2.7.3
- more verbose "usage"-string, now shows options and versions
- no more ':' in filenames (newHierFile)
- new title for html-documents
- improved -h0 behaviour
- bug fixes
- updated to work with rtftohtml-2.7
- Changed the processing of line breaks (\line in RTF,
<shift><return> in MS Word). These were being treated the same as
paragraph marks, but they should have been treated as a simple break. Where
this makes a difference is in lists.
The filter now correctly renders
the following example as a single list element (i.e. one bullet mark) across
two lines.
* some list element\line
more stuff belonging to same list element, but on a different line.
- MSWord 6.0 for the Macintosh is now using 'RTF ' as the file type for rtf
files, instead of TEXT. This caused rtftohtml to ignore those files in the OPEN
file dialog box. You could not drag and drop those RTF files onto the filter
either. The filter now accepts file types of RTF or TEXT.
IMPORTANT!
To make drag-n-drop of MSWord 6.0 RTF files work on your Macintosh, you
must:
- Download rtftohtml 2.7.5 onto your Macintosh
- Copy ALL PREVIOUS VERSIONS of rtftohtml onto a backup disk. Then REMOVE
them from your hard-disk.
- Restart your Macintosh, while holding down the OPTION and Apple keys. This
will re-build your desktop.
- Fixed a bug where column widths of 1 character caused an infinate loop.
- Fixed a problem for NeXT RTF files where \ followed by a newline was not
being interpreted as a \par mark.
- Footnote numbers were not being reset at the start of each file, so the
sceond document converted would have footnotes starting at > 1.
- Some Word 6.0 RTF symbols were improperly coded as destinations. This
caused rtftohtml to discard the rest of the file (searching for a closing '}' )
- Modified status bar for Mac platform - it was incorrect for large files.
- Changed error messages to print line and column information rather than
context.
- Modified MWF processing to produce the correct WMF header
- Added code to recognize that style names of the form "Normal, fred"
actually mean that either Normal or fred are recognized as the style name.
(Implies that style names with ',' in html-trn are incorrect.)
- Added -V option to UNIX version
- Corrected usage message on UNIX version
- Footnotes are now written to a separate file (filename_fn.html). This
allows the <back> button of all browsers to navigate back from a footnote
to the footnote reference.
- A table of contents file is now generated using the text from all headings
<H1>..<Hn> and any table of contents entries in the RTF. The table
of contents has hypertext links from the TOC entries to their locations in the
main HTML text. The -T option disables this feature.
- Generation of graphics files can be disabled through the -G option. The
filter still generates the hypertext links to the graphics files, but it does
not re-write the files themselves. This improves performance for graphics
intensive RTF files.
- Fixed a bug where table of contents entries and index entries were not
appearing in the HTML output.
- Fixed a bug that caused aborts when graphics occurred in a table.
- rtftohtml now closes off any "text" markup
(<em>,<b>,<cite>) before emitting a paragraph tag. This more
closely matches the HTML DTD and corrects a few error conditions.
- The format of the html-trn file has changed. You will need to use the
html-trn file from this distribution and then add any changes that you made to
customize your html-trn.
- Footnotes no longer require special styles in order to work. This means
that all superscripts can now be translated to <su></su> or any
other markup you prefer.
- The Macintosh interface now allows the file Creator to be selected for
graphics, HTML and error files.
- Fixed a bug that caused formatted text in tables to be lost or added to
the wrong cells.
The new format file has an additional item added to the .PTag
table. An entry now looks like this:
#"name","starttag","endtag","col2mark","tabmark","parmark",allowtext,cannest,DelteCol1,fold
.PTag
...
"Normal","","\n","\t","\t","<p>\n",1,0,0,1
"pre","<pre>","</pre>","\t","\t","\n",0,0,0,0
The
fold field should be set to zero (0) for all entries except
"pre" or "listing". A zero entry allows rtftohtml to insert
newlines "\n" into the source so that lines do not get too long in
your HTML file. This feature is designed to make editing/viewing of HTML source
files easier with editors like vi. You must use one (1) for "pre" and
"listing" because newlines are significant characters for these markups.
You must add the following entry to your .PTag table. This is required
to support table translation.
# This is a required entry; tables will be formatted with this entry
"_Table","<pre>","</pre>","\t","\t","\n",0,0,0,0
You must add the following entry to your .PMatch table:
# This is a required entry; tables will be formatted with this entry
"_Table",0,"_Table"
rtftohtml will now translate tables into preformatted ascii tables.
Tables cannot contain any markup. That is, BOLD, ITALIC
etc. will be ignored. Also tables cannot contain anchors, footnotes, or hidden
text.
The filter will produce a table that has the same number of columns as the
input. The size of the columns will be proportional to the sizes of the
input columns. Note that there is no guarantee that text that fit in your
original table will still fit in the formatted version. This is because the
point size of input may not be proportional to the fixed size 80-characters
per-column output. If you find that you do not like the look of the formatted
table - you must re-size the columns in your RTF source.
Columns are always separated by two blanks.
Columns can be left justified, right justified or centered.
Merged cells are allowed.
There is no support for decimal aligned data. This just looked too hard
and prone to error. For now, you will have to make sure that all your data has
the same number of trailing digits and stick to right justified text.
The filter now supports drag-n-drop on the Macintosh!
Options are set with menu selections!
The signiture is even registered with Apple!
Keep all of the translation files (html-trn, ansi-map...) in the same
folder as the application.
Drag any RTF file onto rtftohtml (requires system 7.) The output files will
be placed in the same folder as your input file. If there are any errors in
translating your file, the error messages will be put into a file called
filename.err . Some errors will cause rtftohtml to exit, others will
not.
Once rtftohtml is started, it will continue to run! This allows you to
translate many documents at a time. You can drag lots of files onto rtftohtml
and it will translate all of them (unless it encounters a fatal translation
error.)
To use non-default translation options, simply start rtftohtml by double
clicking on it. Then choose your option from the options menu. These settings
will remain in effect until rtftohtml exits. The next time you run rtftohtml
you must choose your option settings again.
You can open RTF files using Open... from the File menu.
This will only work for files with an extension of .rtf (or .RTF). If you want
to open a file without an extension of .rtf, you must hold down the option key
when selecting Open.
The Graphic file extension in the Options menu allows you to change the
default extension for graphic files from ".gif" to an extension of your own
chooseing.
The Inline graphics option will (when set) use <IMG SRC=... markup in HTML
files for graphics instead of <A HREF=...
The Auto Filenames option when set will put the HTML output in the same folder
as the input, with a .HTML extension. If you want to specify your own
filenames, turn this option off. Then rtftohtml will prompt you for an output
file name each time a file is translated.
- All paragraph styles that appear in a document but that do not appear in
the html-trn file will be treated as a warning. This provides you with a list
of paragraph styles that you can then add to html-trn.
- You can tell rtftohtml to discard all of the text with a given paragraph
style. This is done by using the "_Discard" in the .PMatch table. For exampl,
the following entries will discard all text with paragraph styles "toc 1-5" and
"index 1-5". (Word uses these paragraph styles for generating a table of
contents and index. Since these contain page numbers which do not apply to HTML
documents, you may wish to discard them.)
"toc 1",0,"_Discard"
"toc 2",0,"_Discard"
"toc 3",0,"_Discard"
"toc 4",0,"_Discard"
"toc 5",0,"_Discard"
"index 1",0,"_Discard"
"index 2",0,"_Discard"
"index 3",0,"_Discard"
"index 4",0,"_Discard"
"index 5",0,"_Discard"
rtftohtml was generating "unknown symbol" errors for optional
destinations (RTF with a markup of {\*\xxx ... } The convention for RTF readers
is that any optional destination that a reader does not understand should be
silently ignored. This bug was also causing extra text to be inserted into HTML
files.
- rtftohtml generates empty anchors <a name="myname"></a> where
the "Named Anchor" feature is used. Xmosaic does not support empty anchors,
although other browsers (including Mac Mosaic) do. I have reported this as a
bug to the good folks at NCSA. I am hoping that they fix this so that I don't
have to patch the filter. Stay tuned. - The good folks at NCSA told me not to
wait, so I revised rtftohtml to create anchors with an extra space: <a
name="myname"> </a>
- Imbedded object were not appearing in the HTML output. Now they are!
- Fixed the usage line to reflect that -H is not a legal option.
- Windows Metafiles were given an extension of ".meta". That has been
changed to ".wmf".
- The Mac version failed to find html-trn, html-map when the input file was
not in the same folder.
- Nested lists were not working properly. How did I miss that one :?>
- RTF files created without a leading /pard caused aborts
- Graphics within a HOT text area caused failures instead of generating IMG
links
- Anchor names and Hot Text links were getting an extra ">" inserted.
- Footnotes incorrectly got a "#" at the anchor points.
- Footnotes reference marks were generated without any separating text or
space. Should be generated as [1]
- filter did not understand \tql which is generated in FrameMaker RTF
output. This has been added to reader.c and rtf.h in the source directory.
Eventually, these changes will be incoporated in an official distribution of
the RTF package. (Binary versions of rtftohtml contain the fix)
- FrameMaker RTF assumes that \pard implies \plain - MS Word for the Mac
appears to handle it that way, so I have updated the filter.